Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Administrator, Curator, or StaffNemo_bis Date: Mar 15, 2011 12:37am
Forum: opensource Subject: Re: Uploading JPG files to get an OCRed DJVU

Can such an error be caused by the fact that the images started from _0001 instead of _0000 ? I see that it counted 870 images instead of the correct 869.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffhank_b Date: Mar 15, 2011 11:39am
Forum: opensource Subject: Re: Uploading JPG files to get an OCRed DJVU

The problem with VocabolarioDellaLinguaItaliana is that you put JPG into upper case. (Ideally that shouldn't matter to our code, but as it happens, it does, and fixing that is more work than you might think.) The error:

filename "VocabolarioDellaLinguaItaliana_JPG.tar" doesn't match known patterns for PROCESSED_JPG archive filenames

occurred because it expected to find VocabolarioDellaLinguaItaliana_jpg.tar. Just changing the name of the tar file won't help, though, because then we'll still have trouble with all the internal filenames within the tar containing "JPG".

We *almost* have a easy solution for you, which I was planning to blog about shortly. You can see a preliminary version here:

http://raj.blog.archive.org/2011/02/24/new-upload-format-_images-zip-for-scribe-style-uploads/

The _images.zip format described there would be perfect for you - leave the individual files as-is, and pack them into a new zip named "VocabolarioDellaLinguaItaliana_images.zip", and our system would take it from there...except that we also have trouble with zips of more than 2 GB, which yours would be.

This particular case - I was already looking at it before you posted here - has convinced me I need to implement an _images.tar format, too (we have no size limit on tar's). Then you can just change the name of the your tar to "VocabolarioDellaLinguaItaliana_images.tar", without even having to upload a replacement. Likewise for the other large tar in your item.

I hope to get to that within the next day or two, and plan to post here again when I do.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffNemo_bis Date: Mar 15, 2011 1:32pm
Forum: opensource Subject: Re: Uploading JPG files to get an OCRed DJVU

Thank you very much for your help and great news! This new format is a big improvement.
In the meanwhile, since you've identified the issue, I don't mind much about renaming files and re-uploading this archive with everything lowercase etc., if the "wait for admin" doesn't mean that everything will be locked or ruined.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffhank_b Date: Mar 15, 2011 2:13pm
Forum: opensource Subject: Re: Uploading JPG files to get an OCRed DJVU

No need to wait, then. Go right ahead and exchange your files for new versions (via the "Edit Item!" link). I can rerun the stuck derive whenever they're ready.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffNemo_bis Date: Mar 15, 2011 6:25pm
Forum: opensource Subject: Re: Uploading JPG files to get an OCRed DJVU

Done. Thank you!

P.s.: Delete any non-needed file except the last one I uploaded, if you want (I don't know if they're useful or harmful for the next derive).

This post was modified by Nemo_bis on 2011-03-16 01:25:23

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffNemo_bis Date: Mar 31, 2011 3:10am
Forum: opensource Subject: Re: Uploading JPG files to get an OCRed DJVU

Thank you. It produced a quite nice DjVu and so on but it raised a new fatal error when trying to produce an ePub. I don't care about the ePub, but could you unlock the item so that I can upload other files and change metadata? Thank you!

Reply to this post
Reply [edit]

Poster: benwbrum Date: May 28, 2011 7:04am
Forum: opensource Subject: Re: Uploading JPG files to get an OCRed DJVU

I wanted to mention that the new, fault-tolerant naming conventions work great! Thanks very much to Hank and the IA team for their hard work.

Terms of Use (10 Mar 2001)